Our team explores MLB batting performance indicators using data from Baseball Savant.
Decision-maker / Context:
Research Question:
Impact:
Primary Datasets:
MLB team stats 2021-2025.csv
MLB player stats 2021-2025.csv
Data Metrics Glossary:
1) Player-level Metrics
| Abbreviation | Full Name / Description |
|---|---|
| last_name | Player’s last name |
| first_name | Player’s first name |
| player_id | MLB’s unique ID for the player, used for data merging |
| year | Season year the stats come from |
| player_age | Player’s age during that season |
| pa | Plate appearances |
| hit | Total hits recorded |
| k_percent | Percentage of plate appearances that end in a strikeout |
| bb_percent | Percentage of plate appearances that end in a walk |
| batting_avg | Batting average: how often the player gets a hit per at-bat |
| slg_percent | Slugging percentage: measures hitting power by counting total bases per at-bat |
| on_base_percent | How often the player reaches base by any method (hit, walk, hit-by-pitch) |
| on_base_plus_slg | OPS: combined measure of getting on base (OBP) and hitting for power (SLG) |
| isolated_power | ISO: measures pure power by counting only extra-base hits |
| babip | Batting average on balls in play: how often a ball put in play becomes a hit (excl. HR & strikeouts) |
| b_rbi | Expected/adjusted RBI metric based on quality of contact |
| xba | Expected batting average based on batted-ball quality |
| xslg | Expected slugging percentage based on batted-ball quality |
| woba | Weighted on-base average: advanced metric measuring total offensive value using run-values for each event |
| xwoba | Expected wOBA based on contact quality |
| xobp | Expected on-base percentage |
| xiso | Expected ISO (expected power output) |
| wobacon | wOBA on contact only (ignores strikeouts and walks) |
| xwobacon | Expected wOBA on contact based on batted-ball data |
| exit_velocity_avg | Average exit velocity (mph) of all balls the player hits |
2) Team-level Metrics:
| Abbreviation | Full Name / Description |
|---|---|
| Win% | Team winning percentage over the season |
| SLG | Slugging percentage: measures hitting power by total bases |
| OBP | On-base percentage: how often the team reaches base |
| RBI | Runs batted in: total runs produced by the team |
| ISO | Isolated power: extra-base hitting ability |
| OPS | On-base plus slugging: combined measure of offensive value |
Data characteristics:
Summary statistics:
These plots show projected team-level KPIs for the next three seasons of top 10 teams based on linear regression with correlated predictors.
Top Teams Selected: Based on 2025 Win%
Graphs and Captions:
Projected Win Percentage (2026–2028)
Description: Projected win percentage for top MLB teams. Shows
expected growth/decline in team performance over three years.
Projected SLG (2026–2028)
Description: Shows projected slugging performance of top teams.
Patterns indicate which teams may improve power hitting.
Projected OBP (2026–2028)
Description: Team on-base percentage projection, reflecting
plate discipline and consistency.
Projected RBI (2026–2028)
Description: Estimated run production by team, linked to
scoring potential.
Projected ISO (2026–2028)
Description: Measures team isolated power; highlights teams
likely to hit extra-base hits.
Projected OPS (2026–2028)
Description: Combined on-base plus slugging metric, giving a
broad measure of offensive efficiency.
These plots show projected player-level KPIs for the top 10 players based on prior performance and correlated metrics.
Graphs and Captions:
Projected SLG for Top Players
Description: Expected slugging trends for top players.
Highlights consistency and potential breakout performers.
Projected OPS for Top Players
Description: Combines OBP and SLG for a holistic view of player
offensive output.
Projected ISO for Top Players
Description: Player isolated power trends, indicating
extra-base hitting capability.
Projected Batting Average for Top Players
Description: Tracks expected hit rate per at-bat for top
players.
Projected OBP for Top Players
Description: Player on-base percentage projections, reflecting
consistency and plate discipline.
Projected RBI for Top Players
Description: Expected run production per player, linked to
scoring potential.
Projected Exit Velocity for Top Players
Description: Exit velocity trend predictions; higher values
often correlate with power hitting.
These plots show relationships between exit velocity and player hitting stats, and overall correlations among KPIs.
EV vs SLG:
EV vs RBI:
EV vs OPS:
EV vs OBP:
EV vs ISO:
EV Correlation Heatmap vs Hitting Stats:
Team Rankings Heatmap (2021–2025 averages):
Our analysis combined historical MLB player- and team-level statistics with regression-based projections to estimate future offensive performance for 2026–2028. By examining KPIs such as SLG, OBP, OPS, ISO, RBI, Batting Average, and Exit Velocity, we generated insights into which metrics are most indicative of future success.
Key Findings:
OPS and ISO consistently emerge as the strongest indicators of future offensive performance at both the team and player levels. Teams and players maintaining strong OPS trends also show positive projections in Win%, RBI, and SLG.
Exit Velocity has moderate correlation with power metrics (ISO ≈ 0.65, SLG ≈ 0.62), but does not strongly correlate with most KPIs. → This means EV is useful, but not a primary predictor in our dataset.
Team Projections (2026–2028):
Player Projections (2026–2028):
Heatmaps and Scatter Plots:
Limitations:
The model only uses batting statistics; defensive and situational factors are not included.
Projections are based on linear regression, which may not capture sudden changes (injuries, role changes, coaching changes, etc.).
Future Work:
Add park effects, defensive WAR, sprint speed, pitch-level statcast features, injury history, and multivariate models.
Use machine learning methods for improved forecasting accuracy.
Hypothetical Decision:
Data-driven Recommendation:
Benefits:
Risks:
Challenges:
Victories:
| Member | Role | Contribution |
|---|---|---|
| Matthew G Gonzalez | Project Lead / Co Head Developer | Data cleansing + Writing codes + visualization + findings and write-up |
| Jacob D Lamothe | Code Editor/Checker + Video Editor + Presentation/Narration Lead | Checks code for mistakes/redundancies + statistical validation + Edits video at the end of project |
| Rodolfo Lazaro | Visualization Designer | Tableau plots + checking visualizations |
| Samir Soliman | Head Developer | Import data + write codes + statistical Validation/Model Evaluation + findings and write-up |
output/.Group_3_Checkpoint_2.0
branch regularly.We selected Option 1 — Baseball Savant MLB performance indicators.
Repository created.Team are collaborating on GitHub (Commit → Pull → Push workflow is currently used)
Data downloaded from baseballsavant.mlb.com
Python & Tableau are currently used for analysis and visualizations: Scatter Plots for relationships between variables, Line Plots to show performance trends over years, Boxplots to visualize distributions of key metrics & Histograms for frequency distributions of variables.
The team is working on KPIs and forecast: On-Base Percentage (OBP), Slugging Percentage (SLG), Isolated power, Batting avg. & Exit Velocity
This README.md is Checkpoint 2 deliverable.
The recorded video will show results with narration.
The final report will be displayed in the README.md file on the repo landing page.